Improving the filter bank of a classic speech feature extraction algorithm
نویسندگان
چکیده
The most popular speech feature extractor used in automatic speech recognition (ASR) systems today is the mel frequency cepstral coefficient (mfcc) algorithm. Introduced in 1980, the filter bank-based algorithm eventually replaced linear prediction cepstral coefficients (lpcc) as the premier front end, primarily because of mfcc’s superior robustness to additive noise. However, mfcc does not approximate the critical bandwidth of the human auditory system. We propose a novel scheme for decoupling filter bandwidth from other filter bank parameters, and we demonstrate improved noise robustness over three versions of mfcc through HMMbased experiments with the English digits in various noise environments.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملData-Driven Filter-Bank-based Feature Extraction for Speech Recognition
Selecting good feature is especially important to achieve high speech recognition accuracy. Although the mel-cepstrum is a popular and effective feature for speech recognition, it is still unclear that the filter-bank in the mel-cepstrum is always optimal regardless of speech recognition environments or the characteristics of specific speech data. In this paper, we focus on the data-driven filt...
متن کاملClosed-loop Auditory-based Representation for Robust Speech Recognition
A closed-loop auditory based speech feature extraction algorithm is presented to address the problem of unseen noise for robust speech recognition. This closed-loop model is inspired by the possible role of the medial olivocochlear (MOC) efferent system of the human auditory periphery, which has been suggested in [6, 13, 42] to be important for human speech intelligibility in noisy environment....
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملData Driven Design of Filter Bank for Speech Recognition
Filter bank approach is commonly used in feature extraction phase of speech recognition (e.g. Mel frequency cepstral coefficients). Filter bank is applied for modification of magnitude spectrum according to physiological and psychological findings. However, since mechanism of human auditory system is not fully understood, the optimal filter bank parameters are not known. This work presents a me...
متن کامل